Support multiple Fireworks deployments independently by jahooma · Pull Request #514 · CodebuffAI/codebuff

jahooma · 2026-04-19T22:34:58Z

Summary

Waiting-room admission now admits if any Fireworks deployment is healthy (was worst-of across all). With one deployment per model — and per country in the future — a degraded deployment for one model shouldn't block users whose model routes elsewhere.
DEPLOYMENT_SCALING_UP cooldown is now per-deployment (keyed by deployment path), so one deployment's 503 no longer poisons routing for the others.
Replicas within a deployment need no handling: Fireworks aggregates them server-side via the :sum_by_deployment / :avg_by_deployment metric suffixes.

Test plan

fireworks-health.test.ts — any-healthy, all-degraded, all-unhealthy cases
fireworks-deployment.test.ts — per-deployment cooldown isolation + existing fallback cases (21 tests)
tsc --noEmit clean

🤖 Generated with Claude Code

Admit from the waiting room if any deployment is healthy (was worst-of across all). With one deployment per model — and per country in the future — a degraded deployment for one model shouldn't block users whose model routes elsewhere. Also make the DEPLOYMENT_SCALING_UP cooldown per-deployment; one deployment's 503 no longer poisons routing for the others. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

greptile-apps · 2026-04-19T22:37:19Z

Greptile Summary

This PR refactors Fireworks deployment health checking and cooldown tracking to be per-deployment rather than global, so that a degraded or scaling-up deployment for one model doesn't block routing to other models.

Key changes:

fireworks-health.ts (classify): Switched from "worst-of" to "best-of" semantics — the waiting room now admits users if any deployment is healthy, and only blocks all users when every deployment is non-healthy.
fireworks.ts: Replaced the single global deploymentScalingUpUntil timestamp with a Map<string, number> keyed by deployment path, so a DEPLOYMENT_SCALING_UP 503 on one deployment's cooldown doesn't bleed into other deployments.
Tests: Added isolation tests for per-deployment cooldown and three new health classification scenarios (any-healthy, all-degraded, all-unhealthy).
.gitignore: Added .gstack/.

Confidence Score: 5/5

Safe to merge — changes are well-scoped, backwards-compatible, and covered by new and updated tests.

The logic change in classify is straightforward and the best-of-any semantics align precisely with the deployment architecture (one deployment per model). The per-deployment cooldown Map is a clean, minimal refactor. tsc is clean, all 21 tests pass, and the three new health scenarios and two new cooldown-isolation tests provide solid coverage of the new behaviour. No security implications, no data-loss risk, and the fallback path in createFireworksRequestWithFallback already handles any individual deployment failure gracefully.

No files require special attention.

Important Files Changed

Filename	Overview
web/src/server/free-session/fireworks-health.ts	Rewrites `classify` to best-of-any semantics; adds empty-list guard that complements the one already in `probe()`; logic is clean and well-reasoned.
web/src/llm-api/fireworks.ts	Replaces single global cooldown scalar with a per-deployment `Map`; `!!deploymentModelId` coerces to boolean for the composite boolean expression; all three cooldown functions updated consistently.
web/src/llm-api/tests/fireworks-deployment.test.ts	All existing call sites updated to pass `deploymentId`; two new isolation tests cover cross-deployment independence and selective `resetDeploymentCooldown`.
web/src/server/free-session/tests/fireworks-health.test.ts	Old worst-of test replaced with three focused scenarios; data setup is correct for each expected outcome.
.gitignore	Adds `.gstack/` directory to gitignore — routine developer tooling exclusion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Waiting Room: getFireworksHealth] --> B[probe: fetch Prometheus metrics]
    B --> C{deploymentIds empty?}
    C -- yes --> D[return 'healthy']
    C -- no --> E[classify samples, deploymentIds]

    E --> F{for each deploymentId\nclassifyOne}
    F --> G{any 'healthy'?}
    G -- yes --> D
    G -- no --> H{any 'degraded'?}
    H -- yes --> I[return 'degraded'\ndo NOT admit]
    H -- no --> J[return 'unhealthy'\ndo NOT admit]

    subgraph classifyOne
        K[KV blocks >= 0.98?] -- yes --> L[unhealthy]
        K -- no --> M[5xx rate >= 10%?]
        M -- yes --> L
        M -- no --> N[prefill p90 > 1000ms?]
        N -- yes --> O[degraded]
        N -- no --> P[KV blocks >= 0.80?]
        P -- yes --> O
        P -- no --> Q[healthy]
    end

    subgraph createFireworksRequestWithFallback
        R[isDeploymentCoolingDown deploymentId] -- cooling down --> S[standard Fireworks API]
        R -- not cooling down --> T[custom deployment request]
        T -- 503 DEPLOYMENT_SCALING_UP --> U[markDeploymentScalingUp deploymentId\ncooldown per-deployment Map]
        U --> S
        T -- other 5xx --> S
        T -- success --> V[return response]
    end

_{Reviews (1): Last reviewed commit: "Support multiple Fireworks deployments i..." | Re-trigger Greptile}

jahooma requested review from brandonkachen and charleslien as code owners April 19, 2026 22:35

jahooma closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple Fireworks deployments independently#514

Support multiple Fireworks deployments independently#514
jahooma wants to merge 1 commit intomainfrom
jahooma/fireworks-multi-deploy

jahooma commented Apr 19, 2026

Uh oh!

greptile-apps bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jahooma commented Apr 19, 2026

Summary

Test plan

Uh oh!

greptile-apps bot commented Apr 19, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant